fruits = ['apple', 'banana', 'grapefruit','apple','mango','grapefruit']
with open('vowel_counts.csv', 'w') as output:
for fruit in fruits:
vowel_count = 0
for vowel in 'aeiou':
vowel_count += fruit.count(vowel)
vowel_proportion = vowel_count / len(fruit)
output.write(f'{fruit},{vowel_proportion}\n')Scientific Python antipatterns advent calendar day twenty four
For today, an example involving function design that I’ve often seen in scientific code. As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read. Since I am going by tradition, this will be the last post in this series; I will soon post a summary of them all.
If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:
and I’ll send a single email at the end with links to them all.
Functions as containers rather than logical units
Hopefully it’s not too obvious to point out that functions are incredibly useful :) but using them effectively takes a bit of practice. A common scenario in scientific programming is to take some data points, do some calculations, and write the output to a file.
As an example, we will take our familiar list of fruits and write out a CSV file giving the proportion of characters that are lowercase vowels:
If we want to use this logic in multiple places, we can turn this code into a function. Technically all we need to do is this:
def write_vowel_proportions():
with open('vowel_counts.csv', 'w') as output:
for fruit in fruits:
vowel_count = 0
for vowel in 'aeiou':
vowel_count += fruit.count(vowel)
vowel_proportion = vowel_count / len(fruit)
output.write(f'{fruit},{vowel_proportion}\n')
write_vowel_proportions()but intuitively that is not much use as the filename and input list is hardcoded. Adding some arguments will make the function more flexible:
def write_vowel_proportions(fruits, filename):
with open(filename, 'w') as output:
for fruit in fruits:
vowel_count = 0
for vowel in 'aeiou':
vowel_count += fruit.count(vowel)
vowel_proportion = vowel_count / len(fruit)
output.write(f'{fruit},{vowel_proportion}\n')
write_vowel_proportions(fruits, 'vowel_counts.csv')This feels a bit more like a proper function, but it’s still not quite satisfying. If we imagine trying to test it on a bunch of different fruit names, it’s going to be awkward as we have to run the code then check the output file each time. And the logic is a mixture of vowel counting and CSV output formatting.
The crucial part of function design that many beginners miss is what we can have our functions call each other. So if we extract just the vowel-counting bit into its own function:
def vowel_prop(fruit):
vowel_count = 0
for vowel in 'aeiou':
vowel_count += fruit.count(vowel)
vowel_proportion = vowel_count / len(fruit)
return vowel_proportionthen we have a small logical unit of code that we can easily test:
vowel_prop('banana')0.5
We can then modify our original function to call the vowel_prop one:
def write_vowel_proportions(fruits, filename):
with open(filename, 'w') as output:
for fruit in fruits:
output.write(f'{fruit},{vowel_prop(fruit)}\n')As well as being easier to understand and to test, this two-function approach makes it much easier to reuse the core counting logic. For example, we could take a fruit and check to see if it is mostly made up of vowels:
vowel_prop('avocado') > 0.5True
It also makes it much easier to see the generality of our code. For example, we might notice that our function will work for any string, not just fruit, and so change the variable names appropriately:
def vowel_prop(word):
vowel_count = 0
for vowel in 'aeiou':
vowel_count += word.count(vowel)
vowel_proportion = vowel_count / len(word)
return vowel_proportion
vowel_prop('racecar')0.42857142857142855
We might also notice that it will also work for any collection of characters, not just vowels - this is a nice use case for a default function argument:
def character_prop(word, chars='aeiou'):
character_count = 0
for character in 'aeiou':
character_count += word.count(character)
character_proportion = character_count / len(word)
return character_proportion
character_prop('racecar', 'abc')0.42857142857142855
One more time; if you want to see the rest of these little write-ups, sign up for the mailing list: